MapReduce with Deltas
نویسندگان
چکیده
The MapReduce programming model is extended conservatively to deal with deltas for input data such that recurrent MapReduce computations can be more efficient for the case of input data that changes only slightly over time. That is, the extended model enables more frequent re-execution of MapReduce computations and thereby more up-to-date results in practical applications. Deltas can also be pushed through pipelines of MapReduce computations. The achievable speedup is analyzed and found to be highly predictable. The approach has been implemented in Hadoop, and a code distribution is available online. The correctness of the extended programming model relies on a simple
منابع مشابه
REX: Recursive, Delta-Based Data-Centric Computation
In today’s Web and social network environments, query workloads include ad hoc and OLAP queries, as well as iterative algorithms that analyze data relationships (e.g., link analysis, clustering, learning). Modern DBMSs support ad hoc and OLAP queries, but most are not robust enough to scale to large clusters. Conversely, “cloud” platforms like MapReduce execute chains of batch tasks across clus...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملView Maintenance using Partial Deltas
This paper addresses maintenance of materialized views in a warehousing environment, where views reside on a remote database. We analyze so called Change Data Capture techniques used to capture changes (also referred to as deltas) at the source systems. We show that many existing CDC techniques do not provide complete deltas but rather incomplete (or partial) deltas. Traditional view maintenanc...
متن کاملTopography of inland deltas: Observations, modeling, and experiments
[1] The topography of inland deltas is influenced by the water‐sediment balance in distributary channels and local evaporation and seepage rates. In this letter a reduced complexity model is applied to simulate inland delta formation, and results are compared with the Okavango Delta, Botswana and with a laboratory experiment. We show that water loss in inland deltas produces fundamentally diffe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011